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AUTHENTICATION OF DIGITAL DATA WORKS USING SIGNATURES AND WATERMARKS 

5 Field of the invention 

This invention relates to a method of authenticating digital data works, particularly to enable 
any unauthorised alterations to that work to be rendered readily detectable. 

10 

Description of the Prior Art 

Systems based upon digital data are becoming universal and indispensable; digital data 
passing between computers; digital telecommunications; digital audio; digital cameras; and 
15 the convergence of many of these individual components driven by the internet, set the prior 
art context to this invention. 

There are many applications for a technique that can enable any unauthorised alterations to 
digital images or digital audio to be rendered readily detectable. For example, it is today 

20 very easy to tamper with a digital photograph using image manipulation software, rendering 
the authenticity of any digital photograph questionable. This has serious implications for the 
use of digital photographic evidence in criminal litigation, for example. It may therefore be 
advantageous to be able to assert that the integrity of any given digital photograph can be 
assured. Similarly, it is becoming common to archive documents, including legal contracts 

25 and financial instruments, by imaging and storage on non-erasable digital media, such as 
WORM. There is a pressing need to ensure that those digital records are tamper evident. 
There are similar issues pertaining to digital audio and video. For example, where digitally 
recorded speech is to be used in evidence, typically to confirm the existence and terms of an 
oral agreement, then validation of the integrity of the recording is particularly helpful. 

30 

There are various established approaches to ensuring data integrity in the 
telecommunications and digital audio fields. For example, the use of error correction 
techniques relying upon check-sums. However, these techniques are designed to ensure that 
a digital signal generated by a device, for example, a CD player, is an accurate transmission 
35 or reproduction from a source, for example the data stored within the CD. That is different 
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from being able to detect if, without access to the original CD, any duplicate made of the CD 
is a completely accurate reproduction of the original CD. The present invention is directed to 
solving this latter problem, i.e. whether the integrity of a digital data work has been 
compromised. Hence, the present invention is not directed to manipulating small units of 
5 digital data to enable that data to carry information inherent to the proper comprehension of 
the digital data itself but instead to enabling any unauthorised modification or alteration to a 
digital work to be readily detectable. 

Digital signatures are one technique widely used to confirm the integrity of digital 
10 information, particularly to verify the integrity of digital information which has been 
transmitted from one location to another. Currently, it is possible to include a simple 
signature in the header of the data file of a digital data work. The header might typically 
comprise a checksum derived from the contents of the data file so that any alteration of the 
contents inevitably leads to a mismatch with the checksum; the mismatch can readily be 
15 detected, enabling the alteration to be detected. However, it can be relatively easy to strip 
out the header checksum entirely, in which case one could not establish the integrity of the 
data. 

Hence, the conventional, signature based approach to checking the integrity of a digital file 
20 is as follows: the person who wishes to authenticate a digital file calculates a signature by 
applying an appropriate algorithm to the data values of the file. This signature is then either 
appended at the end of the original file or written into a separate file. The signature is 
transmitted along with the file to anyone who wishes to use the file and have corroborating 
evidence of its authenticity. The verification part of the process is carried out by the 
25 receiver of the file. The receiver must have knowledge of the algorithm used to calculate the 
signature together with any key that governs that algorithm. Using these tools the signature 
is calculated for the received file. The value of the signature that was transmitted is then 
compared with the signature that has just been calculated. Equality of values indicates that, 
to whatever probability the method entails, the transmitted file is indeed an identical copy of 
30 the original file. 

A further possibility as described by EP-A-0402210 is to write the signature into a secure 
hardware device. EP-A-0402210 describes verifying the integrity of a message by looking 
at the unique signature which can be derived from a part of the message and comparing it to 
35 the signature derived from the original of the message, stored in a secure location. 
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Reference should also be made to EP-A-96920963.4, which discloses the approach of 
enabling the integrity or authenticity of a digital data work to be established by changing the 
digital data work according to a particular algorithm so that some or all of its constituent 
5 parts possess a measurable characteristic; that measurable characteristic is changed if any 
alterations to the digital data work are subsequently carried out. That change can then be 
detected using a detection process. 

Reference may also be made to US 5613004 which discloses embedding an invisible 
10 watermarked message into an image; a hash of that message is also embedded, enabling one 
to detect whether the message has been altered. The signature is therefore not of the work as 
such but the message embedded into it. 

15 Statement of the invention 

In accordance with the present invention, a method of authenticating a digital data work, 
comprises the steps of: 

calculating a signature for an original, unaltered version of the digital data work; 
20 modifying some or all of the original, unaltered version of the digital data work in 

dependence on the signature, to generate a modified work, such that an unauthorised 
alteration to the modified work measurably changes the modified work. 

Hence, the essence of the invention is to enable the integrity or authenticity of a digital data 
25 work to be established by calculating a unique signature or hash for the unaltered work and 
embedding that signature in some manner back into the unaltered work in a way that 
actually changes the work itself. That contrasts with the conventional approach of merely 
appending a signature as a header or some other adjunct to the data work. It contrasts also 
with embedding a message and a hash of that message to guarantee the integrity of the 
30 message. Although the present invention is based on the concept taught in EP-A- 
96920963.4 of actually altering the digital data work, this invention goes beyond that since 
EP-A-96920963.4 does not disclose the concept of altering the digital data work by 
embedding in it a unique signature derived from the original data work itself. 



WO 01/23981 



PCT/GB00/03712 



The signature is embedded into the digital work itself such that any usage of the original file 
will typically be unaffected by the signature calculation; indeed it is unlikely that any user 
will be aware of the presence of the embedded information. 

5 The approach of the present invention has two important consequences. The first is that an 
additional level of security is possible over conventional techniques because the location and 
manner in which the signature is embedded into the original data work allows for a further 
form of encryption. The second consequence is that the parts of the data into which the 
signature is encoded have to be handled with specially devised algorithms which will allow 
10 them to contribute to the overall signature in such a way that the contributions will not be 
affected by the signature encoding process applied to the digital image. 

A preferred method applies equally well to image files compressed in the JPEG format and 
to audio files compressed in the MP3 format, although in each of these cases the encoding of 
15 the signature into the data requires additional procedures to avoid it being damaging to the 
quality of the data work. 

In one possible implementation of this method, a signature is calculated using only part of a 
digital data file, and the calculated signature is embedded into this selected part. Thus, for 
20 instance, a document may be scanned into digital form and then areas of the document that 
are particularly significant (e.g. hand-written signatures on a cheque or contract) may be 
chosen. For each of these significant areas a signature may be calculated and embedded into 
the data. Thus an alteration to one of these selected areas could be detected whilst the 
remainder of the document could be certified as unchanged. 

25 

A further variant occurs in the case of JPEG files. In this case it may be that it is important 
in some context to have a signature that is based on the whole file including the header. 
However, when the signature is embedded, it has to be embedded in a particular part of the 
data (see below) because some parts of the data carry critical information in a form such that 
30 any alteration renders the information meaningless. 
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Likewise with MPEG files: a signature may be based purely on the data, or on the whole file 
including the header, but the embedding has to be carried out on the data values alone. 

The nature of the signature depends upon the requirements in the context under 
5 consideration. Algorithms for calculating signatures, known as 'hashing functions', have 
been studied at length and there are secure versions in common use. One such algorithm, 
known as SHA-1, invented in 1994, has a high level of security. A signature of this form is 
really a highly condensed digest of the digital data; it may be that a megabyte of information 
is mapped onto 64 bits of signature. Clearly there are many possible images that map onto 

10 any one signature, but with a good hashing algorithm it is virtually impossible to find 
another image or audio file that has the same signature as that calculated from the original 
data. In the method envisaged for one embodiment of the present invention, an algorithm 
such as SHA1 may be ideal for circumstances in which any alteration to the original data, 
however small, is to be calculated (the "Type A" method of authentication as described in 

15 this specification). 

The present invention also deals with the possibility that a file may be altered to a small 
degree but still be acceptable ("Type B"). Two common situations occur where Type B 
processes might be the useful. The first is where files are compressed but only to the extent 
20 that image or audio reproduction is scarcely perceptibly altered. The second is where image 
files are realised in hard copy and hence have to go through a screening or equivalent 
process and yet are sufficiently high quality to be acceptable as authentic versions of the 
original. In these cases this specification describes how a signature might be used to provide 
a guarantee of integrity and a measure of the extent to which changes have occurred. 

25 

In this latter, Type B case, the signature needs a property which is specifically avoided in 
algorithms of the SHA-1 type. That is the property that a small change in the image will 
produce a small change in the signature. The reason for this is that using SHA-1 an 
operation such as compression might give a totally different value of the signature from that 
30 corresponding to the original. In the Type B method described in this specification, the 
variation in signature would be usable as a measure of the degradation that has occurred. 
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This latter type of signature has the drawback that it is more possible to produce two images 
with the same signature, but by careful choice of function this risk can be made acceptably 
small. 

5 These 'approximate' signatures for Type B applications will typically use sets of orthogonal 
functions to develop an image description. These functions will vary according to a key so 
that it will not be possible to develop a general method for making known modifications to 
the signature. Alternatively, the signature may be calculated by taking a random selection of 
pixels from the data, selecting the pixels in such a way that anyone attempting to alter the 
10 file would have difficulty in avoiding all of the sampling points. 

In one embodiment, the original digital data work is modified not simply by a signature, but 
also by an externally generated code. That code may form part of a copyright management 
system. 

15 

In another aspect of the invention, there is provided a method of detecting any alteration of a 
digital data work, to which the authentication method of the present invention has been 
applied, comprising the steps of: 

reading a signature embedded into the modified work; 
20 calculating a signature for the modified work; 

comparing the embedded signature read from the work with the calculated 
signature and determining that the modified work has been altered if the embedded signature 
does not match or otherwise correspond to the calculated signature. 

25 This enables the authenticity of the digital data work to be confirmed or rejected. 

In further aspects of the invention, there is provided adigital data work to which the 
authentication method encompassed by the invention has been applied; a computer program 
operable to authenticate a digital data work using such an authentication method, a computer 
program operable to detect any alteration to a digital data work, to which such an 

30 authentication method has been applied, and, finally, digital media pre-recorded with such a 
computer program. 
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Brief Description of the Drawings 

The invention will be described with reference to the accompanying drawings in which: 

5 

Figure 1 is a flow diagram of the essential steps performed in one embodiment of the present 
invention. 

10 Detailed Description 
Signature Embedding 

In a preferred embodiment of the present invention, the embedding of the signature into 
original data is performed in such a way that (i) the file formats of the original data are 
unchanged and (ii) any changes in the file data will be sufficiently small as to be 
15 imperceptible when the files arc realised as images or sound clips. The nature of this 
embedding depends on whether we are considering Type A or Type B: Type A 
authentication (as noted above) is designed to detect any modifications however small, 
whereas with Type B authentication, the embedded signature may have to survive minor, 
legitimate manipulation. 

20 

Referring now to Type A methods, some spatial domain (as opposed to frequency domain) 
modification of the data work must take place in the actual process of authentication, but the 
number of sites at which data is modified to embed a signature must be kept to a minimum. 
In order to give added security to the process these sites are selected by a process which is 
25 dependent on a crypto key unique to a user. Obviously, many possible means of selection 
are available. Random number generators offer a simple method. An alternative method 
used in previous fingerprinting applications (such as EP-A-96904936.0) is the selection by 
the use of permutations. 

30 The embedding of a signature for JPEG files requires a greater degree of differentiation of 
data. In the JPEG format, part of the information is encoded in the form of 'Huffman' code, 
a variable length coding in which values that frequently occur are encoded with shorter 
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length codes than those of rarer occurrence. The problem with this sort of code is that it 
cannot be modified to contain information without rendering the code unreadable. For this 
reason the embedding of the signature in a JPEG file has to be carried out in the part of the 
file which simply gives the magnitude of the data values after the discrete cosine 

5 transformation (DCT). These changes will affect the quality of the information to a minor 
degree. The perceptibility of these changes is minimised by selecting from areas with a large 
number of non-zero coefficients in the DCT. The actual selection may be governed by the 
encoding key, adding a further level of security. These areas correspond to parts of images 
where there is a lot of change present rather than to smooth areas and are areas where 

10 alterations may be carried out with little effect on image quality. For JPEG data, a signature 
could be calculated for a complete file, including the header, of for the data part alone, or for 
some subset of the data. In each of these cases, however, the restriction as to the mode of 
embedding that is described above must apply. As for non-compressed files the image is 
divided into subsections. In the case of JPEG files subdivision is carried out by insertion of 

15 "Restart Markers." A hash value is calculated using all of the data that describes a given 
subsection without regard to the interpretation of that data. In one embodiment the hash 
value includes the Huffman and Quantisation tables which are mandatory parts of the header 

20 

The situation with MPEG files involves another level of complexity. In this case the data 
following the header is stored frame by frame, but in order to achieve compression there 
exist certain reference frames which are placed at regular intervals and enable intermediate 
frames to be encoded in a more economic manner. One method of handling the signature is 
25 by calculating the signature value on all frames but restricting the embedding to the 
reference frames in a manner similar to that employed for JPEG files. 

MP3 files, (defined in ISO/IEC 11172-3) like JPEG files, have compressed format with 
variable length code. Again like JPEG files it is essential to modify values in such a way that 
30 whilst there is a mild degradation of audio quality there is no risk to the whole structure and 
interpretation of the files. 
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As in previous cases an MP3 file is subdivided into subsections, a hash value calculated for 
each subsection and then embedded back into that subsection. 

5 In the case of MP3, data is divided into fixed length subsections as part of the compression 
algorithm. This is because the audio data is analysed into a range of frequencies so that 
certain psycho audio effects may be taken into account to allow removal of data which will 
produce sounds that are masked by louder sounds on neighbouring frequencies. The data is 
then expressed as amplitudes of a range of frequencies, complicated by the inclusion of 
10 scaling factors. 

Two embodiments are cited here. In the first the hash values are embedded in the "scale 
factors" which are described in the MP3 specification. There are several scale factors in each 
frame of MP3 data and so suitable factors can be chosen for amendment, chosen in such a 
15 way as to produce minimal perceptual impact. In the second the hash value is embedded by 
modifying the "preflag" (see MP3 spec), a quantity which occurs more sparsely than scale 
factors, but nonetheless a quantity which can be modified without damaging degradation to 
the audio quality. 

20 With Type B authentication, the embedded signature may have to survive minor 
manipulation, as mentioned previously. In this case the signature will be embedded in the 
form of an invisible fingerprint or watermark (see EP-A-96904936.0). The essence of the 
watermarking is that a large number of data values are affected but only modified by a very 
small amount. The result is that if such a file were to be compressed, for instance, the 

25 embedded signature could still be read from the watermark and the signature itself, 
calculated from the data in the file, would have an approximately equal value. 

A fundamental feature of the method of authentication described herein is that alterations 
may be detected if data is cropped, or copied and pasted from authenticated files. 

30 
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Methods of authentication have been in use in which, as described in this patent, a hash 
value is obtained to describe a set of data, and the hash value is dependent upon a key which 
should be unique to the user. The hash value is then appended to the file or, as in this 
proposal, embedded back into the file. In some cases the hash value describes a complete 
5 file and hence although an alteration may be detected in the file there is no means to 
determine the location in which the alteration has been made. 

In some cases, a file may be subdivided into sections and a hash value determined for each 
section. In these cases it will be possible if an alteration has occurred not only to detect that 
10 such an alteration has taken place, but also to locate the alteration within one of the 
subsections. 

Two possible weaknesses occur in the above scenario. The first is that cutting and pasting 
may not be detected, the second that cropping of files may not be detected. This is best 
15 illustrated by examples. 

Suppose that authentication is used to protect a set of scanned cheques. The authentication 
may subdivide the image into convenient subsections, calculate a hash value for each 
subsection and embed the hash value back into the file. The manner of embedding the hash 

20 value may depend upon a key which is unique to the user of the software, and this key will 
be used to select the sites at which the hash value is embedded. The whole set of cheques 
may be authenticated with the same key. If one of the areas of subdivision is the section of 
the cheque which indicates the amount to be transferred, this area will be treated exactly the 
same on each cheque in the sense that the same hash function will be used and the same 

25 embedding points selected to embed the hash value. If now this important area of the 
cheque is copied from one cheque onto another the presence of the forgery will not be 
detected because the hash value will be the one that matches the data. 

The second problem, that of cropping, is a serious issue for images and audio clips. A 
30 forensic image may be totally different in interpretation if some part is cropped, perhaps 
some bystander being removed from a scene of criminal activity. The meaning of an audio 
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clip may be reversed by omission of some introductory qualifying phrase. Now again, 
preceding patents as described would confirm the authenticity of a given file even if there 
were parts omitted, provided that those parts were complete subdivisions of the original. 

5 There are several embodiments of the present invention which overcome the above 
objections. In one embodiment the hash value calculated for a given subdivision of an image 
or audio clip includes a value which indicates the position in the file of that subdivision. For 
example, if an image is divided into rectangles the distance of each edge of the rectangle 
from the edges of the original file might be included in the hash value. If then any cropping 
10 of the image occurs the distances to the edge would be altered and the hash value falsified. 

In another embodiment the hash value or any subsection depends upon values in 
neighbouring subsections. The number of values can be chosen according to the size of area 
within which an alteration may be detected, and upon the probability of false values giving 
15 rise to the same hash value. In this case if an area were to be cut and pasted the incorrect 
values for surrounding areas would corrupt the hash value, allowing detection of the 
counterfeit. 

Detail of Type A Authentication: detection of any change, however small 

20 Suppose we have a set D of digital values {d;}. 

Each user has a key, K, and this key is used to select a subset D e , small in comparison with 

D, into which the signature will be embedded. 

Thus D = D U + D e ..(1) 

where D u is the set of values that will be unchanged. 
25 Thus we might embed 128 bits of signature into a 1 megabyte file. 

An algorithm, A, is devised which maps a set of values onto a single value, S, the signature 
of the data. At its simplest this might simply be a case of adding each pixel value 
multiplied by the x and y coordinates and then neglecting the overflow if values exceeded 
30 128 bits. Or, for an audio file, multiplying the amplitude value by the position in the file and 
accumulating values similarly to image files. 
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We need to calculate the signature using the values in D e as well as those in the unmodified 
set D u . This will prevent alterations to the set D e going undetected. The problem is that the 
values of D e which the authenticator will see may be different from those that the detector 
5 sees because they have been modified in the process of embedding the signature. To cope 
with this it is necessary to use modified values from D e to calculate the signature, modified 
in such a way that the signature calculation will be unaffected by changes required for the 
embedding process (see example below). 

10 In mathematical terms a mapping, M, and a coding algorithm, C, are constructed such that:- 

M maps the set of values in D c onto a new set , D eim . 

M(D e ) = De,m (2) 

15 

A signature for the data set D is calculated . This signature uses the values in the part of the 
data, D u , which is unchanged together with the modified values of the embedding set, D e . 

S = A(D U + D e , m ) (2A) 

20 

Once the signature is calculated it needs to be embedded into D e by a coding algorithm C. A 
very simple method is to express S as a binary string and embed it by changing values of D e 
to even numbers to represent '0' and to odd numbers to represent '1.' 

25 Thus, the value of S is coded into the values D e by algorithm C, 

C(S,D e ) = D e ,c (3) 



The mapping M must have the required property, 
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M(D e ,) = M(C(S,D e ,m) ) (4) 

That is, the contribution to the signature by elements in set D e must be unaltered by the 
modification of the values to include the coded signature. 

5 Checking of the integrity of the data consists of :- 
(i) calculating D e from key, K. 



(ii) 


mapping D e onto D e , m 


10 (iii) 


calculating S = A(D e + C(S,D em )) ( 


(iv) 


deriving the embedded signature in C(S,De,m) 


(v) 


comparing derived signature with the S in (iii). 



Security 

15 There are 4 areas in which a form of encryption is used and where security can be added. 

1 . The method of mapping a key, K, onto a selection algorithm which derives the set D e for 
embedding the signature. 

20 2. The selection of hashing algorithm, A, for calculating the signature. 

3. The choice of coding algorithm, C, such that the set of values of D e is mapped onto a 
new set of values. 

25 

Part of the security method is contained in the control of access to the software which adds 
the signature. It is envisaged that the signature will be added at a small number of secure 
sites but that the detection program will be widely distributed. 
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Now for most applications a fairly simple level of encryption will be more than sufficient to 
deter any but the most sophisticated of attacks. However, if the detector is distributed it will 
be possible to reverse engineer the algorithms. In any case the detector must contain the 
algorithms A and C and the algorithm for selecting D e . 

5 

Mapping of the key onto the selection of sites 

A key may be of virtually any length according to the level of security required. The 
signature that is calculated may be converted into a binary string of virtually any length and 
there is then a requirement that the number of sites chosen to embed the string must be equal 
10 to that length. 

One method of selection of suitable sites is the permutation method described in EP-A- 
96904936.0, in the name of the present applicant, the text of which is incorporated by 
reference into this specification. This method relies on an internal permutation to generate a 
15 set of permutations which in turn identify a set of locations in a file. These locations are then 
used to embed the coded signature. The values at these selected sites are then mapped onto 
new values, by using an algorithm of the type M above, and it is these modified values that 
are used in the calculation of the signature. 

20 Calculation of the Signature (algorithm 'A' above) 

There are many available hashing algorithms with accepted levels of security. The SHA 1 
algorithm, for instance, is used by PGP in its applications. Any such algorithm is acceptable 
to produce a signature to be embedded. However, since the security is not provided entirely 
by the hashing it is possible to use a simpler hashing algorithm or one with particular 
25 properties, and still have a secure signature. 

Encoding of Signature ('C and 'M' above) 

The simplest way to encode the signature, S, is probably to take the binary string and embed 
it at the selected sites using coding, C, such that a ' 1 ' bit corresponds to an odd value and a 
30 '0' bit corresponds to an even value. This type of coding ensures that no data value needs to 
be changed by more than one unit. This fact, together with the fact there is only a small 
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number of sites which need to be amended, ensure that the modifications are minimal and 
certainly would not be visible. 

The mapping, M, for this particular coding algorithm, can simply be the rule "divide by two 
5 giving the result as the nearest integer" provided that the coding of odd and even is carried 
out by use of subtraction and not addition. This is best illustrated by an example. 

If original value is 25 and it is to be rendered even then the adjusted value must be 24 and 
not 26 on the grounds that 25/2 = 24/2 neglecting the fractional part. 

10 

i.e. M( D e ,) = M(C(S,D e ,m)) (7) 

IfD e = 25, C(D e ) = 24, M(C(D e )) = 12 (8) 

15 M(D e ) = M(25) = 12 (9) 

More sophisticated coding algorithms exist and can provide higher security. 
For example, supposing that there is a restriction of the coding algorithm that it shall not 
alter any value by more than one unit, as above, then the coding algorithm may again divide 
20 the data values into pairs but the interpretation of each member of the pair may be varied 
according to a selected rule. Again, illustrating by an example: 

Suppose that each member of the data set has a value in the range 0 to 7. Suppose we had 4 
sites that were used to embed the signature 1001 . 
25 Suppose the original values at these sites were 4,6,3,5 

Using the method described above, to embed 1001 these values would be modified to be 

odd,even,even,odd respectively. 

Thus 4,6,3,5 would be mapped onto 3,6,2,5. 

The values used to calculate the signature would be half of each of the above values, i.e. 
30 they would take the values 2,3,1,2 and this would be the case both for the original and 
modified set. 



WO 01/23981 



PCT/GB00/03712 



16 



Thus in the above system the data values are mapped onto coding values as below:- 

(2 ~ 0 to be read as '2 is mapped onto 0') 
5 0~ 0,1-1, 2~ 0,3-1, 4~0,5~1,6~0,7~1 (10) 

However, we could equally well pair the numbers in a different fashion. Thus, for example, 

0-0,1-1,2-1,3-0,4-1,5-0,6-0,7-1 (11) 

10 

If then we wished to embed 1001 in data values originally 4,6,3,5 we would proceed as 
follows 

The value 4 corresponds to the value 1 in (1 1) and so need not be changed 

15 

The value 6 corresponds to the value 0 in (1 1) and so need not be changed 

The value 3 corresponds to the value 0 in (1 1) and so need not be changed. 

20 The value 5 corresponds to the value 0 in (11) and so must be changed to 4 which 
corresponds to 1 in (1 1). Note M(5) = 2 and M(4) =2 so the value used for calculation of the 
signature is unchanged. 

Generalising the above, if the data values range from 0 to n - 1 , these values are grouped 
25 in pairs. One member of each pair is made to correspond to the value 1, the other 
corresponds to the value zero. The choice of which corresponds to 1 and which to zero 
can be by algorithms based on the key or the signature. The best from the security view is 
to base the algorithm on the signature. For instance, if the signature were to be expressed 
as a binary string, whenever a 1 occurred the pair of numbers might be allocated in the 
30 order 01, whereas when a 0 occurs the pair might be allocated in the order 10. To add to 
the security any encryption might be used to map the signature onto an encrypted value. 
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A higher level of security can be provided by the following method. The signature, S, is 
calculated as above. This is then encrypted by the private key of an asymmetric encryption 
algorithm, E, to give new signature, S e . The RSA method-provides a suitable algorithm. 
5 S e is embedded in the data by coding algorithm, C. 

The detection software has only the public key. The detection process consists of calculating 
the signature S by the method above and reading the coded embedded signature, S e . The 
public key is then used to decode S e when it can be compared with the value of S. 

10 

Detail of Type B Authentication: detection of changes, ignoring small alterations 
likely to be imposed by legitimate processes 

15 If a document that is at least partly handwritten (cheque or financial document) is scanned as 
a greyscale image it could be authenticated with an approximate hash function. Thus if it 
were to be stored in compressed form it would be possible to check whether there had been 
significant modification to the image. (If the document were to be entirely produced from 
the keyboard it is likely that OCR would be used.) If a document as described above were 

20 to be printed, the approximate hash function could indicate whether or not there had been 
significant alterations. 

If an image needs to be authenticated at point of delivery and before any possible 
compression it could be represented by an approximate hash function. This hash function 
25 would then only be slightly modified by compression or printing and hence could supply a 
confidence level for such alterations. 

Approximate Algorithm 

The approximate algorithm works similarly to the Type A method in that a hash value for 
30 the entire image is calculated and then embedded back into the document. In the 
approximate algorithm, however, the embedding cannot be on a simple scheme of 
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modifying least significant bits because these bits may well be modified by the sort of small 
changes the algorithm is required to handle. Instead the embedding is in the form of a very 
light watermark that modifies very slightly every pixel, being rather like the watermarking 
for medical images where sensitivity is of the essence. 

5 

An alternative to embedding the hash value is to store the value in a database in encrypted 
form. This would allow for a more detailed image description. 

The essential feature of the approximate hashing algorithm is that it corresponds to 
10 geometrical features of the image. This means that small changes in the image will produce 
small changes in the calculated signature, notably in the case of JPEG files and printed files 
the signature will only undergo small changes. The usual emphasis in hashing algorithms is 
for small changes in data values to produce totally distinct hashing values. It is this 
distinctness that supplies the security and makes it difficult to produce a second document 
1 5 with the same hashing value as an existing document. 

The approximate algorithm must have its security protected by other means. One method is 
to apply an asymmetric encryption to the hash value before it is written as a watermark. The 
decoder then needs only the public key to verify integrity and if the security protocol is such 
20 that the private key is only in the hands of trusted parties, the security can be high. 

The hashing algorithm requires slightly different qualities according to whether it is to be 
applied to files that remain wholly in the electronic domain and hence will not suffer any 
change of orientation or aspect ratio, or to files that are to be interpreted in hard copy and 
25 rescanned. These two cases are discussed below. 

Electronic Files 

The hashing algorithm will in essence be a geometrical image descriptor with a high degree 
of accuracy. The most straightforward descriptors available are first and second order 
30 moments which correspond to the notions of centre of mass and principle axes. A set of 
further descriptors must be added to limit the image manipulation that can be carried out 
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without detection. One such set could be supplied by the use of moments derived from 
orthogonal functions. 

A method of protecting the security whilst using a simple moment calculation would be to 
5 divide the image into several different sets, where the sets are selected according to a user 
key, and to find linear moments from each set. If linear moments were to be taken without 
any other protection it would be fairly easy for a user to make alterations which left the 
centre of mass of the image unaltered However, if the moments are taken from randomly 
selected subsets of an image no such possibility exists. 

10 

The sets would be selected such that groups of pixels of significant size for the document in 
question would belong to one set. Thus for documents the sets would consist of groups of 
pixels of roughly the size of a printed character. 

15 Hard Copy Files 

In the case of hard copy files the first and second order moments of the whole image would 
be used to establish scaling and orientation. All subsequent descriptors would be evaluated 
using these as a co-ordinate system. Again a set of orthogonal functions could provide 
further descriptors, or, as above, simple moments for selected subsets could be used. 

20 



The appended Flow Diagram (figure 1) summaries the essentials of the above processes. 
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Claims 

1. A method of authenticating a digital data work, to enable an unauthorised 
5 alteration to that work to be readily detectable, comprising the steps of: 

calculating a signature for an original, unaltered version of the digital data work; 

modifying the original, unaltered version of the digital data work in dependence on 
the signature by embedding into it the signature to generate a modified work, such that the 
unauthorised alteration to the modified work measurably changes the modified work. 

10 

2. The method of claim 1 in which a signature is calculated using only part of a digital 
data work, and the calculated signature is embedded into this part. 

15 3. The method of Claim 2 in which the digital data work is an image and the part of the 
image from which the signature is calculated and into which the signature is embedded 
contains the most significant information. 

4. The method of Claim 1 in which a signature is calculated using the whole of a digital 
20 data file, and the calculated signature is embedded into this whole, other than portions which 

carry critical information which should not be altered. 

5. The method of Claim 1 in which the signature is used to provide a measure of the 
extent to which acceptable changes have occurred to the digital data work. 

25 

6. The method of Claim 5 in which the signature possesses the property that a small 
change in the data work will produce a small change in the signature so that a variation in 
signature is usable as a measure of any degradation that has occurred. 
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7. The method of Claim 1 in which the signature is calculated by taking a random 
selection of pixels from an image data work, selecting the pixels in such a way that anyone 
attempting to alter the file would have difficulty in avoiding all of the sampling points. 

5 8. The method of Claim 1 in which the original digital data work is modified by a 
signature and an externally generated code. 

9. The method of Claim 8 in which the externally generated code may form part of a 
copyright management system. 

10 

10. The method of Claim 1 in which the distribution of the signature within the digital 
data work is selected by a cryptographic process. 

11. The method of Claim 10 in which the cryptographic process is a permutation 
15 process. 

12. The method of Claim 1 in which the data works are MPEG audio files and the 
signature value is calculated on all frames but the embedding is restricted to the reference 
frames. 

20 

13. The method of Claim 1 in which the data works are JPEG image files and the 
embedding of the signature is carried out in the part of the file which gives the magnitude of 
the data values after the discrete cosine transformation. 

25 14. The method of Claim 1 in which the data work is subject to minor permissible 
manipulations, in which the signature is embedded in the form of an invisible watermark. 

15. The method of Claim 14 in which the minor permissible manipulations affect a large 
number of data values in the watermark, but only by a very small amount, such that the 
30 embedded signature is still derivable from the watermark and the signature, calculated from 
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the data in the file, would have an approximately equal value to the embedded signature. 

16. The method of Claim 1 in which the signature is derived using a hashing algorithm 
selected so that it is virtually impossible to find several images or audio files that generate 

5 the same signature as that calculated from the original digital data work. 

17. A method of detecting any alteration of a digital data work, to which the 
authentication method of claims 1-16 has been applied, comprising the steps of: 

reading the signature embedded into the modified work; 
10 calculating a signature for the modified work; 

comparing the embedded signature read from the work with the calculated signature 
and determining that the modified work has been altered if the embedded signature does not 
match or otherwise correspond to the calculated signature . 

15 18. A method of confirming the integrity of a digital data work, to which the 
authentication method of claims 1-16 has been applied, comprising the steps of: 



20 signature and 

(d) confirming the integrity of the modified work if the embedded signature 
matches or otherwise corresponds to the calculated signature. 

25 19. A digital data work to which the authentication method of any of claims 1-16 has 
been applied. 

20. A computer program operable to authenticate a digital data work using the method 
defined in any of Claims 1 to 16. 

30 



(a) 
(b) 
(c) 



reading the signature embedded into the modified work; 
calculating a signature for the modified work; 

comparing the embedded signature read from the work with the calculated 
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21 . A computer program operable to detect any alteration to a digital data work, to which 
the authentication method of claims 1-16 has been applied, using the method defined in 
claim 17. 

5 22. Digital media pre-recorded with a computer program as defined in either Claim 20 or 
claim 21. 



10 
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